Analytical performance estimation during code generation on modern GPUs

نویسندگان

چکیده

Automatic code generation is frequently used to create implementations of algorithms specifically tuned particular hardware and application parameters. The process involves the selection adequate transformations, tuning parameters, parallelization strategies. We propose an alternative time-intensive autotuning, scenario-specific performance models, or black-box machine learning select best-performing configuration. This paper identifies relevant performance-defining mechanisms for memory-intensive GPU applications through a model coupled with analytic metric estimator. enables quick exploration large configuration spaces identify highly efficient candidates high accuracy. examine changes A100 architecture compared predecessor V100 address challenges how data transfer volumes new memory hierarchy. show our method can be “pystencils” stencil generator, which generate kernels range-four 3D-25pt complex two-phase fluid solver based on Lattice Boltzmann Method. For both, it delivers ranking that candidate. not limited but integrated into any generator required expressions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Automatic Code Generation for GPUs⋆

Abstract. Graphics Processing Units (GPUs) have become highly parallel and programmable systems used as commodity data-parallel coprocessors. Moreover manufacturers have developed new software interfaces that facilitate their use. Thus, new compilation strategies that enable automatic mapping of sequential code would very likely arise in the near future. To open this path, we need to define som...

متن کامل

GPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on GPUs

Spatial blocking is a critical memory-access optimization to efficiently exploit the computing resources of parallel processors, such as many-core GPUs. By reusing cache-loaded data over multiple spatial iterations, spatial blocking can significantly lessen the pressure of accessing slow global memory. Stencil computations, for example, can exploit such data reuse via spatial blocking through t...

متن کامل

Evaluation of state-of-the-art polyhedral tools for automatic code generation on GPUs

At present, multi-core and manycore platforms lead the computer industry, forcing software developers to adopt new programming paradigms, in order to fully exploit their computing capabilities. Nowadays, Graphics Processing Units (GPUs) are one of representatives of many-core architectures, and certainly the most widespread. This paper evaluates and compares tool frameworks that automatically g...

متن کامل

High-Quality Point-Based Rendering on Modern GPUs

In the last years point-based rendering has been shown to offer the potential to outperform traditional triangle based rendering both in speed and visual quality when it comes to processing highly complex models. Existing surface splatting techniques achieve superior visual quality by proper filtering but they are still limited in rendering speed. On the other hand the increasing availability a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Parallel and Distributed Computing

سال: 2023

ISSN: ['1096-0848', '0743-7315']

DOI: https://doi.org/10.1016/j.jpdc.2022.11.003